Portable Knowledge Sources for Machine Translation
نویسنده
چکیده
in this paper, we describe the acquisition iuld (Irga-nization of knowledge sources fur machine translation (MT) systems. It has heen liointed out by many users that one of the most annoying things idmtlt MT sys-terns is tim repeated occurrence of identical errors in word sense and attachment dlsambiguation. We show the limitations of a conventional user-dictionary method and explain how our approach solves the prol/lem. 1. Introduction In the last decal% more and more commercia.l machine translation (MT) systems have lmcome available for a wide variety (if languag, e Iiairs. An MT system is a very handy tool: trot one quickly Iinds out thai, it Irlakes tt, e same errors over and over again even if a user dictionary is carefully maintained. There are sew, ral re;mons for such repeated errors. 1. Commercial MT systems are not tmilt in actor dance with a powerful h;xical semantic formalism. The user dictionary alone cannot (llsamlfiguate word senses and phrasal icttitelimei/ts satisNmtorily. 2. MT systems cannot handle the domain and context dei)endency of word sm,se, ph rasal atl, aeh men L an d word selection. 3. In a shared environment, each user has a different nt user dictionary, and must therefore redumhmtly correct the same errors ms all the other users. A powerful lexieal semantic apl)roaeh [s] couhl give more accurate translatiml~ but it might be. too Inuch to ask users to develop their dictionaries within that formalism. Tl, e simple structure of a user dieti(mary also restricts the learning ability of M'r systems during the post-editing process. The second of the almw~ re;kstms tl~ motivated recent exanlple-ba~ed and case-b~med machine translation re.search [9, s, 10]. However, a method for finding the best-matehlng eases hi a cime. base., where cases (or exalnples) are collected from different dmna.ins or contexts~ has not been studied well. Nor is it kllown whether considering the frequency of eases gives a better result. The third reason is rarely hut it is riot desirable sirnply to share a single user dictionary, since the dictionary may become inconsistent by reflecting multil>le users' updates. McRoy [s] discussed word sense disambiguation using multlph; knowledge sources, but her method is still dictionary-b~med. Some of the eoolmerclal systems for human-aided trailslatlm h such as the Translation Manager/2 [~1~ can provide the user with nmre Ilexible access to multilile dictionaries and the Iranslation memory (a repository of pairs of smlrce and target sentences). This …
منابع مشابه
Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization
Although neural machine translation has made significant progress recently, how to integrate multiple overlapping, arbitrary prior knowledge sources remains a challenge. In this work, we propose to use posterior regularization to provide a general framework for integrating prior knowledge into neural machine translation. We represent prior knowledge sources as features in a log-linear model, wh...
متن کاملKnowledge sources for disambiguating highly ambiguous verbs in machine translation
Word sense disambiguation (WSD) is one of the most challenging outstanding problems in the current machine translation systems. An effective proposal in this context will rely on the use relevant knowledge sources. Moreover, it must perform better than the current traditional approaches. We present some experiments with machine learning algorithms traditionally applied to WSD, aiming to discove...
متن کاملAutomated Dictionary Extraction for \Knowledge-Free" Example-Based Translation
An Example-Based Machine Translation system is supplied with a sentencealigned bilingual corpus, but no other knowledge sources. Using the knowledge implicit in the corpus, it generates a bilingual word-for-word dictionary for alignment during translation. With such an automatically-generated dictionary, the system covers (with equivalent quality) more of its input on unseen texts than the same...
متن کاملMultilingual Computer-based Communication and Language Processing: Lithuanian Case
The article gives short overview of language processing technology in Lithuania and focuses on development the computer-based translation problem from English to Lithuanian. Common machine translation models are discussed, conceptual hierarchical model for computer-based translation system from English into Lithuanian is proposed. It is based on hierarchical blackboard architecture and includes...
متن کاملIn-Depth Knowledge-Based Machine Translation
The development of ap integrated knowledge-based machine-aided translation system called PANGLOSS in collaboration with the Center for Machine 'Ikanslation (CMT) at CMU and the Computing Research Laboratory (CRL) at New Mexico State University. The IS1 part of the collaboration is focused initially on providing the system's output capabilities, primarily in English and then in other languages, ...
متن کامل